home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.c++,comp.programming
- Path: uu4news.netcom.com!friend!news
- From: rich@kastle.com (Richard Krehbiel)
- Subject: Re: Why are 32 bit better than 16 bit pgms?
- Message-ID: <1996Jan19.130447.12215@friend.kastle.com>
- Sender: news@friend.kastle.com (News)
- Reply-To: rich@kastle.com
- Organization: Kastle Development Associates
- X-Newsreader: Forte Free Agent 1.0.82
- References: <30FBFFE6.1FEB@netcom.com>
- Date: Fri, 19 Jan 1996 13:03:41 GMT
-
- "Keith S." <vain@netcom.com> wrote:
-
- >I have a simple questions:
-
- > What's are 32 bit pgms better than 16 bit programs?
-
- > Thanks!
-
- Well, I have two 16-bit contexts I can relate to: PDP-11 and Intel.
-
- The PDP-11 is truly a 16 bit CPU. It has a 64K address space, period.
- Going to 32 bits means being able to use more than 64K. Simple enough
- to understand, eh? (Well... PDP-11's usually have a protected MMU,
- only visible/useable by the OS, that can address up to 4M, and usually
- OS offers services to "bank-swap" parts of the app's memory space.)
-
- The Intel is a little tricker. Simple answer: Addressing large memory
- is *much* faster in 32 bits than in 16 bits, which must deal with
- things as if it were multiple 64K pieces. Read on for details...
-
- It was born with segment:offset addressing, where a pointer to memory
- has two 16 bit parts. Your instructions run faster when you only
- consider and manipulate the 16 bit offset part. This is called a
- "near pointer", when only the 16 bit offset is known and the segment
- part is "assumed". To increment a near pointer is one fast
- instruction. Of course, since a near pointer is only 16 bits, there
- can only be 64K of "near" memory - not enough.
-
- If you want to deal with memory objects larger than 64K you have to
- use a "far" pointer, which includes both the 16 bit segment and 16 bit
- offset. To increment a far pointer, you increment the offset until it
- overflows, then you change the segment. This is *far* slower. First
- of all you have to look for overflow from the inc, meaning a
- conditional branch, the kind of thing that defeats instruction
- prefetch and pipelines. And when I say "change the segment" I'm
- talking about a serious operation. First of all, the segment part in
- a protected mode program is an MMU table subscript. The MMU owner
- (the OS) defines what segment points to what memory and what kind of
- memory it is. You can't just inc the segment value (or add 0x100 like
- in real mode) to address the next consecutive memory location after an
- offset overflow, you have to know the OS's convention for allocating
- segment values.
-
- Now add to that the fact that the simple-looking 16 bit segment load
- operation invokes complicated processor behavior. Remembering that a
- segment value is really an MMU table subscript, when you load the
- segment register, behind the scenes the CPU checks the value against
- the LDT/GDT to see if it's legal, then it fetches the 8 byte MMU table
- entry into a segment cache, and while it does it performs some other
- validity checks as well. The result is that segment register loading
- is a SLOW instruction.
-
- My 386 reference says moving a register to another register takes two
- clock cycles, unless the destination is a segment register, in which
- case it takes 18. It takes nine times as long. Sheesh. A "near"
- subroutine call takes 7 cycles. A far subroutine call takes 34 -
- about five times slower. I don't have any newer references; perhaps
- the differences are smaller in the Pentium.
-
- Now for 32 bit mode:
-
- In 32 bit mode, suddenly the offsets are 32 bits. Now a "near"
- pointer (where the segment part is just assumed) is 32 bits and is
- large enough to address all the memory. Hey, the segment registers
- are still there and still fully functional; the 386 can support
- multiple 4G segments. However, ALL the 32 bit OS designers said "f*ck
- that" and gave applications only a single segment of enormous size.
-
- Suddenly life is good. Pointer math for large objects is of the
- simple one-instruction kind, as fast as any other math. Pointer
- loading is as fast as any other 32 bit load. No more segment override
- prefixes. In 16 bit mode it's architecturally impossible to have a
- stack larger than 64K, but in 32 bit mode it's as large as, well,
- every other segment (there's but one, remember?). Things are simpler
- and faster, and you can (almost) forget that the abomination of
- segments ever existed.
-
- On the other hand... Suppose you have a program that's small, simple
- and fast, it never needed more that 64K of memory and it doesn't count
- higher than 64 thousand. 32 bits won't make it faster, in fact it'll
- make it larger and slower. Sorry.
-
- Oh, let me just throw in the newest good reason why 32 bits is faster
- than 16 bits. The new Pentium Pro slows down on 16 bit code. It uses
- fascinating new technology, replete with all the latest buzzwords; x86
- instructions are decomposed into nano-ops and scheduled out-of-order
- to multiple functional units. There's a little problem with it,
- however. It's fancy data paths are designed for carrying 32 bit
- values around. If forced to operate on pieces of whole registers,
- like it would in 16 bit mode, then it's registers can't rename and
- it's pipelines stall until it can make whole 32 bit results from 16
- bit operations.
-
- --
- Richard Krehbiel, Kastle Systems, Arlington VA USA
- rich@kastle.com (work) or richk@mnsinc.com (personal)
-
-